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We assess the reliability of the recently developed approach denominated Dominant Reaction 
Pathways (DRP) by studying the folding of a 16-residue /3-hairpin, within a coarse-grained Go- 
type model. We show that the DRP predictions are in quantitative agreement with the results of 
Molecular Dynamics simulations, performed in the same model. On the other hand, in the DRP 
approach, the computational difficulties associated to the decoupling of time scales are rigorously 
bypassed. The analysis of the important transition pathways supports a picture of the /3-hairpin 
folding in which the reaction is initiated by the collapse of the hydrophobic cluster. 
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In this work we address the problem of identifying the statistically important pathways in thermally activated 
conformational transitions of macro-molecules. If such reactions involve overcoming free-energy barriers, straight- 
forward molecular dynamics (MD) simulations are very inefficient and often impracticable, due to the decoupling 
of the microscopic time scale associated to local conformational changes and the macroscopic time scale related to 
the inverse reaction rate. For example, in the case of protein folding, MD simulations can only access times of few 
ns, whereas the entire folding processes generally take milliseconds or longer. As a consequence of the existence of 
free-energy barriers between folded and unfolded states, all the computational time in MD simulations is invested for 
PQ describing the motion of the system, when it is exploring the portion of configuration space associated to the stable 
and meta-stable states. 

. In a series of recent works [TJ |51 [3] , we have developed a theoretical/computational framework, which we have 

r& called Dominant Reaction Pathways (DRP), to rigorously identify the statistically significant pathways in thermally 

activated reactions of systems with a large number of degrees of freedom. Unlike other approaches in the literature, 
1— 1 the DRP method does not involve any uncontrolled ad hoc approximation and does not require any a priori choice of 

reaction coordinates. 

Before applying the DRP method to make quantitative predictions for complex molecular transitions, we need to 
show that it produces results consistent with those of MD simulations, obtained in the same model. A first validation 
analysis of this of this type was performed in [2], where we studied the conformational transitions of a very simple 
£ — molecule — alanine dipeptide — using the all-atom GROMOS96 force field. We observed an excellent agreement with 
the results of MD simulations, performed in the same model. On the other hand, the computational gain of adopting 
the DRP method was impressive: the characterization of the relevant transitions in atomistic detail took just few 
minutes on a regular desktop, while about a week of calculations on the same machine was required, in order to extract 
the same amount of information, from the standard MD techniques. These results suggest that the DRP method can 
be an efficient tool to characterize isomerizations, allosteric transitions and in general reactions connecting well-defined 
j>. stable molecular conformations . 

On the other hand, if either the initial or the final state has a very large conformational entropy, the characterization 
of the reaction becomes much more complex. A prototype of this class of problems is again the study of protein folding. 
Proteins can fold in an uncountable infinity of ways, corresponding to the infinite set of denatured configurations form 
which the reaction can be initiated. In this case, in order to extract significant information from either MD simulations 
or DRP calculations, one necessarily needs to perform an average over a representative set of transitions, which reach 
the same native state starting from different denatured configurations. Hence, the question arises whether one needs 
to average over an exponentially large number of folding trajectories or if a finite — and possibly relatively small — 
number of independent trajectories are sufficient to characterize the folding. Clearly, in the first scenario, any approach 
which can at most provide a handful of folding transitions looses its practical utility. 

In this work, we address this issue by studying the folding of a 16-residue chain — /3-hairpin fragment of GB1 — , 
using a Go- type reduced model. The dynamics of the Go model is sufficiently simple to allow many folding-unfolding 
trajectories to be generated by MD simulations, at an affordable computational cost. On the other hand, this model 
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FIG. 1: 16-residue C-terminus of protein G-Bl (PDB code 2gbl). 



displays some of the complexity which we expect to face, in more realistic and detailed all-atom simulations of protein 
folding. In particular, the reaction takes place in a high-dimensional funneled energy landscape in which the native 
state has a small conformational entropy, while the denatured conformations visit a large portion of configuration 
space. 

The direct comparison with the results of simulations shows that the DRP approach remains accurate and com- 
putationally efficient also for protein folding, at least for the simple model used. In fact, we can accurately predict 
the configurations which are most probably visited during the unfolding/refolding reaction, by averaging over the 
dominant reaction pathways corresponding to just few initial denatured conformations. 

In the final part of this work, we study the folding mechanism for the hairpin under consideration. We find that 
our analysis favors a picture in which the folding is initiated by the collapse of an hydrophobic cluster. This result 
agrees with that of MD simulations performed in the same model and with most of the existing theoretical analysis, 
both in the context of coarse-grained and all atom models. Interestingly, florescence experiments suggest a different 
mechanism, in which the folding is initiated at the turn (see [U [TS] and references therein) . 

The paper is organized as follows. In section [TTJ, we introduce the Go-type model used in the MD and DRP 
calculations. In section |III[ we briefly review the DRP approach — for a detailed and self-contained presentation, we 
refer the reader to [5] — , while some general and qualitative aspects of the stochastic dynamics in such a model are 
discussed in IV In section [V] we present the algorithm used to characterize the ensemble of statistically significant 



folding transitions, including the effects of thermal fluctuations. The DRP results for the folding reaction of the 
/3-hairpin are presented in section |VI) where they are compared with the predictions of MD simulations. Section |VII| 
is devoted to a discussion of the qualitative folding mechanism, which emerges from our DRP and MD calculations. 
The main results and conclusions are summarized in section fVIIII 

II. GO-TYPE MODEL FOR THE FOLDING OF A /3-HAIRPIN 

In this work, we consider the folding reaction of the 16-residue C-terminus of protein G-Bl (PDB code 2gbl), with 
sequence 

GLY-GLU-TRP-THR-TYR-ASP-ASP-ALA-THR-LYS-THR-PHE-THR-VAL-THR-GLU 

In solution, this fragment is known to assume the stable /3-hairpin conformation shown in Fig. [T] 

Fluorescence experiments performed by Eaton and coworkers [4] have shown that this hairpin exhibits two-state 
kinetics between the folded and unfolded state, with a relaxation time of 6 ^s. In the last decade, this peptide has 
become a standard test system, to investigate the thermodynamics and the kinetics of /3-sheet formation (see e.g. [5] 
and references therein). 

In order to be able to compare the results of MD and DRP calculations, we must adopt a model which is sufficiently 
simple to be investigated with either methods, at an affordable computational cost. Hence, we consider a Go-type 
coarse-grained representation [5], defined by the interaction 
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where N p = 16 is the number of amino-acids, a = 0.38 nm is the equilibrium distance between consecutive a- 
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FIG. 2: Free energy landscape in three surfaces identified by the parameters di—ie, Rh, and N c 



carbons on the chain, a = 0.25nm is the residue's size 1 e = 5kJ/mol is the maximum energy gain per contact and 
Kf, = 2 x 10 3 kJnm~ 2 /mol is the elastic constant of the harmonic bond. The matrix Cy selects the native contacts: 
it is set equal to 1 if the native distance between residues i and j is less than 0.65 ran, and is set to otherwise. 

In this study, we chose to keep the interaction as simple as possible and do not include Coulomb, angular or torsional 
terms in (0. The energetic bias toward the native state is effective only when the monomers come close, since the 
non-bonded interaction in becomes negligible for distances greater than approximatively one ran. 

The dynamics of the hairpin in a thermalized heat-bath can be described by 16 coupled Langevin Eq.s. 



xi = 



D 
D 

k^r 



ViU(xi, ...,x Np ) + r]i(t), 
Vjv„^(xi, ...,x Np ) + n Np (t). 



where rji(t) are usual Gaussian noise functions, obeying the fluctuation-dissipation relationship 



(rn(t)ri J (p)) = 6D8(t)6 t 



(2) 



(3) 



and D — 1.2 x I0~ 3 nm 2 ps~ 1 is the diffusion coefficient of a typical amino-acid in water, at room temperature. Notice 
that in the original Langevin Eq. there is an acceleration term, rax. However, as it was shown in [7], for proteins, the 
effect of including such a term becomes negligible for time scales larger than the fraction of the ps. 

Let us now discuss the key features of the dynamics of such a model, which can be inferred from performing 
numerical MD simulations 2 . To this end, let us focus on the following set of order parameters: 

• The distance c?i-i6 between the C and N terminus of the chain 

• The angle <fi at the turn of the hairpin, i.e. between the Asp, Ala and Lys residues, 
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where is the distance between the i-th and j-th residues in the sequence. 

• The fraction of native contacts 

• The radius of gyration Rh of the group defined by the hydrophobic residues Phe, Tyr, and Trp. 

We stress the fact that the choice of this specific set of parameters in only functional to the interpretation of our 
results. Neither MD simulations nor the DRP method require any a priori choice of reaction coordinates. Fig. [2] 
displays the free energy landscape in three sufaces selected by the above order parameters, calculated from ~ 10 
steps of MD, with an integration time-step of O.Olps. An example of a typical folding transition observed during a 



1 Since we were interested in the validation of the DRP method, and not on the phenomenological implications, in this work we chose to 
adopt a rather small hard-core radius, in order to make numerical simulations faster. For example, the acceptance rate of Monte Carlo 
simulations varies significantly with a. 

In this work, we shall improperly call MD simulation the numerical integration of the Langevin Eq. (B in the so-called "Ito Calculus". 



FIG. 3: An example of folding transition observed during a MD simulation, plotted in the plane selected by the hard-core 
radius Rh and the turn angle <f> 



MD simulation is displayed in Fig|3] in the plane selected by the hard-core radius Rh and the turn angle 4>. In Figj4] 
we show the evolution of the potential energy of the chain, during a 30ns long fraction of an MD simulation, at room 
temperature. 

These results show that this Go-type model displays a unique, thermodynamically stable fold. The unfolded 
configurations do not form a thermodynamically stable state, and the folding reaction proceeds downhill. A closer 
look on the three free energy profiles in Fig|2] reveals that in each profile there is a region in which the free energy is 
nearly flat. The corresponding region of configuration space is that in which the denatured chain spends the longest 
time. Hence, the most probable unfolding reaction pathways are those which leave the native state, travel along the 
valley of the free energy profile and reach the regions where the free energy is almost flat. In section VI we shall 
see that the DRP method is able to predict and accurately locate such most probable trajectories, without any prior 
knowledge of the free energy profile. 



III. THE DRP APPROACH 



The starting point of the DRP approach is the path integral representation of the Fokker-Planck conditional 
probability to perform a transition from an initial configuration x% to a final configuration in a time t, i.e. 

P(x f ,t\ Xi ,0) = e as* / Vx(t) e J ° \ 4D ( n ) . (5) 

J Xi 

D is a constant diffusion coefficient, and V e ff(x) is an effective potential defined as 

Ve/f{x) = 4(fc ^ r)2 ((VU(x)) 2 - 2k B TV 2 U(x)) , (6) 

x = (xi, ...,xjv ) is a vector specifying the position of all the N p constituents of the molecule (atoms or amino-acids) . 
In the specific case of the study of the protein folding reaction, the initial configuration Xi may be taken in the 
denatured state, while the final configuration Xf = xjq lies in native state. 

The DRP approach is based on the saddle-point approximation of the path integral (|5| . The idea is to functionally 

expand around the minima of the effective action S e ff — Jq dr ^ + V e ff[x(r)]\ . The minimum action paths are 

the most probable trajectories — or dominant reaction pathways — which were first discussed in ^ |5] and The 
key advantage of the DRP method is that the dominant reaction pathways can be very efficiently determined, by 
exploiting the fact that they are solutions of a symplectic classical equations of motion, 

±x(t) = VV eff (x). (7) 
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FIG. 4: Time evolution of the potential energy during a fraction of a MD simulation. 

The solutions of ^ conserve an effective "energy" E e ff defined as 3 

Eef f = ^i 2 -V eff (x). (8) 

As a consequence, the action S e ff evaluated along a dominant reaction pathway can be re-written as 

S e ff(x f , x i} t) = -E(t)t + S H j(xf,Xi;E eff (t)), (9) 
where SHj(xf,Xi]E e ff) is the Hamilton-Jacobi (HJ) functional defined as 

S B j(x f , Xi ; E eff (t)) = [ ' dlJE eff (t) + V eff [xi(l)], (10) 



where dl = V dx 2 measures the infinitesimal displacement of all the constituents in an elementary path step. 

Each choice of the effective energy selects the time taken by a single transition, from the initial configuration Xi to 
the final configuration x / , according to the well known relationship 

f Xf 1 

t= / dl -. . (11) 

L t ^J±D (E eff (t) + V eff [xi(l)]) 

In a protein folding transition, the final state x(t) should be taken as the native state of the protein Xf, that is the 
absolute minimum of U(x), or possibly a minimum of V e ff(x) close to it. In order for the protein to stay in this state 
as long as possible, the velocity of the system at that point should be x(t) = Xf — 0, which implies 

Eeff = lD ±2{t) " V 'ff&W = -^//(^))- ( 12 ) 



The dominant reaction pathways are practically found by minimizing numerically a discretized version of Eq. (10 1 

AT.-l 

Sh.j = 



HJ = -j= J2 yJ(y eff (n)-V eff (l))Al n , n+1 + XP, (13) 

V ^ 



Note that E e f t has the dimension of a rate and therefore does not have the physical interpretation of a mechanical energy. 
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where N s is the total number of discretization steps and 

N 



V ef f(n) 
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5^V J U'(xi(n),...,Xj V (n)) -2fc B T^V, 2 (7(x 1 (n),...,x J v(n)) 



P = £ (AZ n , n+1 - (AZ)) 5 



= ^(x 4 (n+ 1) -x,(n)) : 



(14) 

(15) 
(16) 



iV p is the number of particles (atoms or residues) in the molecule, A l n , n +i is the Euclidean measure of the n — th 
elementary path-step and A P is a penalty function which keeps all the length elements close to their average [5] 
and becomes irrelevant in the continuum limit, TV — > oo. For any finite number of path discretization steps, this 
term enforces the condition that the relevant transitions are sampled at constant displacement steps, rather than at 
constant time steps. This difference is crucial, because it is responsible for the computational advantage of the DRP 
approach. In fact, the total Euclidean distance to be covered to transit from a coil configuration to the native state is 
typically only 1 order of magnitude larger than the most microscopic length scale, i.e. the typical monomer (or atom) 
size. As a consequence, only 30 — 100 discretized displacement steps are generally sufficient for convergence [TJ [2]. 
This number should be confronted with the O(10 12 ) time discretization steps which would be required to accurately 
describe a ms-long folding transition, using uniform time steps, by sampling directly the integral ^ or by performing 
a long MD simulation. The physical reason for such an impressive simplification is that the DRP formalism avoids 
investing computational time to describe the time evolution of the system when it is not progressing towards the final 
state. 

Once the dominant pathways have been determined, it is possible to systematically account for the effects of 
quadratic thermal fluctuations around them, by means of the Monte Carlo algorithm [5]. Let us denote by 



x{n) = (xi(n), ...,xjv_(n)), 



(17) 



with n = X,..,N a dominant trajectory, i.e. a sequence of molecule configurations determined by minimizing numer- 
ically the discretized HJ action (131. The time at which each configuration x(n) is visited during the transition can 



be obtained by computing the set of time intervals separating each of the path steps x(n) from x(n + 1): 

AZ„ jn +i 



At, 



71,71 + 1 



y/4D(V eff (x(n))-V eff (x(l))Y 



(18) 



The sequence of time intervals obtained this way can be used to write a discretized version of the transition 
probability ([5|: 



N-l 



P(x f ,x t ;T(N))= / Y[ [dx 1 {n)...dx Np {? 
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) -) 2 + V ef f( Xl (n),..., XNp (n))] 



(19) 



We stress the fact that in Eq. (19) the time intervals are chosen large (small), when the most probable trajectory at 
that time is evolving slowly (fast). In other words, the information encoded in the dominant reaction pathway has 
been used to identify a particularly convenient discretized representation of the path integral §5§ , in which the sizes 
of the time steps are adapted according to the evolution of the most probable pathways. 

Some comments on the discussion made so far are in order. First of all, it is important to emphasize that the 
saddle-point approximation underlying the DRP approach is reliable only in the small temperature limit, in which 
the dominant pathways have little overlap and the contribution arising from the thermal fluctuations around them 
leads only to small corrections. Hence, an important question which will be addressed below is if this condition is 
satisfied in the case of protein folding, at room temperature. A second observation is that, from the computational 
point of view, the Monte Carlo sampling of the ensemble of fluctuations around each dominant reaction pathway in 
(19 1 is essentially as expensive as finding the saddle-point path itself. In fact, in both cases one needs to sample a 



space with 3N p x N degrees of freedom. Finally, we note that, in order to single out the next-to-leading corrections 
in the saddle-point approximation of the Fokker-Planck conditional probability, one should first expand the action 
to quadratic order around the dominant reaction pathways and then perform the numerical Monte Carlo sampling. 
However, in the regime of validity of the saddle-point approximation, this is not necessary. In fact, the contribution 
to the Fokker-Planck conditional probability ^ of the cubic, quartic and higher terms lead to corrections which come 
in at next-to-next-to-leading order. 
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IV. DYNAMICS OF THE DOMINANT REACTION PATHWAYS 



As discussed in section |III| the saddle-point paths satisfy the equations of motion associated to the effective action 

■V/-= / 'H > —xf I +K//(x 1 ,...,x JV ), (20) 



where the effective potential V e j / is defined in Eq. (16]) . 

The exponent of the effective potential e~ v " ff ^ x '^ t represents the probability for the chain to retain the same 
configuration x, during an infinitesimal time-interval dt. In particular, the first term in (|6]) controls the tendency 
of the protein to rapidly evolve away from regions of configuration space where the forces are very large. We shall 
refer to this term as to the force contribution to the effective potential. The second term in ([6| reduces the value of 
V e ff in the vicinity of the narrow minima of the physical potential U(x), where the Laplacian is large and positive. 
This means that the chain will be trapped for long times in regions of configuration space characterized by a small 
conformational entropy. For this reason, we shall refer to such a term as the entropic contribution to the effective 
potential V e ff. 

We note the fact that other approaches proposed in the literature to sample the relevant trajectories of Langevin 
dynamics have ignored the entropic contribution to the effective potential [10] . However, as it was clearly shown in 
[llj . the inclusion of the entropic term is crucial in order to correctly identify the saddle-point pathways. In this 
section, we take a closer look to the role played by the entropic term in the specific case of molecular interaction. In 
particular, we estimate its magnitude, relative to the force term, at room temperature. 

Let us consider the effective potential associated to the interaction 0, between a single pair of residues. We begin 
by observing that the entropic contribution coming from the harmonic interaction between consecutive residues on 
the chain reduces to an irrelevant constant, and therefore can be dropped. One the other hand, the non-bonded 
interaction in ([I]) one has 



v eff(r) 
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(21) 
(22) 
(23) 



The numerical value of the force and entropic contributions to v e ff(r) for different relative distances r are presented 
and compared in Fig(5| We note that, at distances smaller than the Van-der-Waals radius, the effective potential is 
dominated by the force contribution. This is expected, as configurations in which two residues have some overlap are 
highly unstable under Langevin dynamics, because the effect of the force drift is very large. On the other hand, it 
is interesting to observe that, outside the hardcore, the effective potential is dominated by the entropic contribution. 
In particular, entropic effects are responsible for a minimum of the effective potential at distances of the order of 
the Van-der-Waals radius. Also this fact is easily interpreted: the configurations in which two monomers get close 
to each other without overlapping have a relatively long residence time, in Langevin dynamics. In fact, the short- 
range repulsion prohibits the system to evolve into configurations in which the two monomers come closer, while the 
Van-der-Waals attraction disfavors the escape toward configurations in which the monomers are further separated. 

From this discussion it follows that the stochastic dynamics of the polypeptide chain is the result of a delicate 
competition of entropic and energetic effects. Approaches in which one retains only the force term miss this important 
feature of the stochastic dynamics generated by the Langevin equation. 



V. DRP CALCULATION 



The DRP method yields the important reaction pathways connecting two well-defined system configurations, which 
have to be provided independently. Hence, the first step to set up a DRP calculation consists in choosing the boundary 
conditions of the conditional Fokker-Planck probability (|5|. In many reactions of interest, such as cis-trans isomer- 
ization or allosteric transitions, both the initial and final configurations are well-defined and can be experimentally 
determined. However, in the specific context of the study of the protein folding reaction, the denatured conformations 
populate a huge portion of the configuration space and cannot be determined experimentally. Under such conditions, 
one has to average over the transitions between a representative set of different denatured conformations and the 
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FIG. 5: Comparison between the force and entropic contribution to the effective potential for a non-bonded Van-der-Waals 
type of interaction, Eq.( |23| |. 



native state. The DRP retains its computational usefulness only if the folding transition can be accurately charac- 
terized by averaging over a relatively small number of denatured configurations We generated the set of denatured 
configurations by performing ~ 10 4 steps of MD simulation at 500i"T, starting from the native state, followed by few 
thermalized MD steps at 3007^. This way, we observed a relatively rapid unfolding of the chain. 

The next step in DRP calculations consists in finding a representative set of dominant pathways, by minimizing 



numerically the HJ action (10), for each of the boundary conditions which have been previously determined. Any 
algorithm for the minimization of the HJ functional has to be provided with a starting path connecting the initial and 
final points, from which it begins the search for the most probable trajectories. The trajectories used to generate the 
denatured configuration by high-temperature MD simulations represent a natural and convenient choice, which has 
the advantage to avoid residue overlap, covalent bond breaking, etc... However, such MD trajectories contain a very 
large number of discretization steps Nmd ~ 10 4 , while the main advantage of the DRP method is that it allows to 
keep only a small number of path steps Ndrp, typically of the order of 30 — 100. Consequently, before beginning the 
minimization procedure, one needs to reduce the number of path steps in the initial MD trajectory from Nmd down 
to Ndrp ^ Nm d ■ This was done by applying the iterative smearing algorithm introduced in [3] : at each iteration, we 
defined a new path {x'(i)}j = i...jv MD; consisting of Nmd /I steps, in which each configuration was defined as average 
of two configurations of the input path at consecutive time steps, i.e. 

x'(i) = i(x(2i + l)+x(2i)). (24) 

After few of such decimation steps, we obtained a smeared periodic trajectory consisting of Nd rp = 50 steps. 

Occasionally, the averaging involved in the smearing algorithm led to paths which contained configurations in which 
the hardcore of some of the residues overlap. Under such circumstances, we observed a huge increase of the HJ effective 
action and the rejection rate of the relaxation algorithm was very large. In order to accelerate the convergence of the 
optimization algorithm, we ran a short preliminary energy minimization of each of the configurations in the starting 
path. 

The relaxation of the HJ action was performed via a simple adaptive simulated annealing algorithm. After a 
preliminary thermalization phase, based on the Metropolis algorithm, we performed 100 cooling cycles, consisting of 
10 steps each. Each move was obtained by up-dating the path globally, i.e. moving the positions of all the particles, 
in each configuration of the path. At the end of each cooling cycle, the boldness of the moves was adjusted, in order 
to keep the rejection rate between 50% and 90%. 

After generating the set of dominant reaction pathways, we sampled the ensemble of thermal fluctuations around 
each of them according to the algorithm introduced in [3] and illustrated in section |III[ i.e. by sampling the path 



integral (19) through a Monte Carlo simulation, using the saddle-point paths determined before as the starting points 
of the Markov chains. We adopted the Metropolis algorithm and sampled 10 thermal fluctuations for each dominant 
reaction trajectory. 
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FIG. 6: Structure of the thermal fluctuations around a dominant reaction pathways. The solid line represents the local dominant 
reaction pathways, the dotted lines represent 10 independent thermal fluctuations around the dominant reaction pathways. 



VI. RESULTS AND DISCUSSION 



In this section we analyze the results obtained with the DRP method. We study the ampitude of thermal fluctuations 
around the dominant reaction pathways obtained at room temperature, and we compare the outcome of the DRP 
analysis with the results obtained from MD simulations. 



A. The consistency of the saddle-point approximation 

The first question we address is whether the saddle-point approximation underlying the DRP approach is consis- 
tent, when applied to the study of the chain's conformational transitions, in the present model. A first necessary 
condition is that, at room temperature T = 300-ftT, thermal fluctuations around the dominant pathways must be 
small. More precisely, including quadratic fluctuations should provide only small corrections to the leading-order pre- 
dictions. A second necessary consistency condition is that thermal fluctuations around different saddle-points must 
not significantly overlap 4 [3]. 

In order to verify if the first of such conditions is fulfilled, in Fig. [6]we compare different types of trajectories, in the 
plane selected by the Rh and d\-iQ reaction coordinates. The solid line represents the result of the minimization of 
the HJ action, i.e. a dominant reaction pathway. The dotted lines are 10 independent thermal fluctuations obtained 



by 2000 steps of Monte Carlo sampling of the path integral (19 1, starting from the same dominant trajectory. We 
observe that the thermal fluctuations shown in Fig(6j remain close to and qualitatively similar to the nearby saddle- 
point path. This feature is not altered when one considers longer Markov chains in the Monte Carlo sampling of the 



thermal fluctuations, via the path integral (19 1 



Let us now discuss the second consistency condition, i.e. whether thermal fluctuations associated to different 
dominant reaction pathways significantly overlap. If this was the case, then we would expect that the relaxation of 
the HJ action performed starting from fluctuations associated to a given saddle-point would occasionally converge 
to a different saddle-point. However, for all the 8 dominant pathways considered, the annealing of the HJ action 
performed starting form each of the corresponding thermal fluctuations converged very rapidly to the original saddle- 
point trajectory. This fact gives us some confidence that the thermal fluctuations associated to different minima of 
the HJ action do not significantly overlap and that the saddle-point expansion underlying the DRP method is working 
well. 



4 Note that these conditions are the analog of those required for the validity of the semi-classical approximation in Quantum Mechanics. 
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FIG. 7: Comparison between MD and DRP calculations for three pairs of order parameters (di_i6 vs Rh, Rh vs <j) and Rh 
vs N c . The free energy landscape has been obtained by MD simulations, while the line represents the trajectory obtained 
averaging over 8 dominant reaction pathways.. 



B. Comparison with MD 

A fundamental condition for the DRP analysis to be useful is that it is possible to reliably identify the regions of 
configurations space which are most probably visited during the folding reaction by averaging over a moderate number 
of independent dominant reaction pathways trajectories. To check if this is the case for the present model, in Fig(7]we 
compare the results of DRP simulations with the free energy landscapes obtained by long MD trajectories, in three 
surfaces selected by the order parameters di-ie, Rh, N c and (j>. The line represents the most probable trajectory, 
which was obtained averaging over only 8 independent dominant reaction pathways. In total, this calculation required 
about 6 CPU hours. 

We can see that the DRP method correctly identifies the regions of configuration space which are most likely 
explored by the MD trajectories. In fact, the line obtained averaging over the 8 initial configurations travels along 
the valley in the free energy profile until it approaches a region where the free energy is nearly flat. It should be 
stressed that the DRP calculation does not need to make any assumption about the structure of the free energy, nor 
it requires any choice of reaction coordinate. The trajectories obtained from averaging over the dominant reaction 
pathways are parameter free predictions and the agreement with MD results is observed in all the combinations of 
the selected set of reaction coordinates. 
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FIG. 8: Statistical errors on the trajectory obtained by averaging over 8 dominant reaction pathways in three surfaces selected 
by the parameters di_i6, N c and 4>. 



An important related question to address is what precision can be achieved by averaging over a moderate number 
of initial configurations. To answer, in Fig(8]we plot the error on the average, obtained from the standard deviation 
calculated from the 8 trajectories. Even with such a small number of trajectories, the statistical error on the average 
is rather small, compared with the scale at which the free energy profile varies significantly. 

We stress the fact that the number of dominant pathways considered is almost three orders of magnitude smaller 
than the number of unfolding/refolding transitions which were required to obtain the free energy landscape from MD 
simulations. 

In Figj9]we plot the potential energy U(x), the effective potential V e ff (x) and the residence time along a typical 
path. From the left panel, we can see that the transition involves overcoming several energetic barriers. The fact that 
the DRP method finds the optimal path in the free energy landscape while paying a price in energy suggests that 
the contribution of entropic effects is significant. This fact is confirmed by the plot presented in the center panel of 
Fig|9] , where the force and entropic contributions to the effective potential are compared and found to be of the same 
order (see discussion in section IV and in [H]). The relaxation of the HJ action drifts the paths towards regions of 
configurations space where the forces are small and the Laplacian is large and positive. The third panel of Fig. [9] 
displays the residence time along a typical dominant trajectory. We see that the residence time diverges in the vicinity 
of the native state, while the molecule spends very little time in each of the denatured conformation. This is what 
one expects from general arguments. In fact, equilibrium experiments indicate that at room temperature, typically 
about 80% of the chains are in the (almost unique) native conformation, while the remaining 20% arc in denatured 
conformations. Since the conformational entropy of the denatured states is huge, the time spent by the chain in each 
denatured conformation must be exponentially small. 



VII. FOLDING MECHANISM FOR THE /3-HAIRPIN 



Although the focus of the present work is on assessing the reliability of the DRP method, it is instructive to discuss 
the folding mechanism suggested by the present simple Go-type model. This section can be considered as preparation 




Fraction of the until ciml'oniiaitonal transiiton Fraction of the Tola] Conformational Transition Fraction of the Total Conformational Iraiisition 



FIG. 9: Evolution of the total potential energy U(x) (left panel), effective potential V e ff{x) (center panel) and residence time 
(right panel) along a typical dominant reaction pathways. 
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Fraction of conformational change 



FIG. 10: Evolution of the order parameters Rh and of the fraction of native contacts N c as a function of the fraction of 
conformational changes, averaged over 10 independent dominant reaction pathways trajectories. 



work for a forthcoming analysis, which will be based on atomistic force fields, as it was done in [5] for alanine dipeptide. 

The problem of identifying the mechanism responsible for the folding of a /3-hairpin has been the subject of an 
intense theoretical and experimental investigation (see e.g. ^ and references therein). Eaton and co-workers have 
proposed a picture in which the stabilizing hydrogen bonds on the backbone are formed one by one, and are initiated 
at the turn (so called "zipper" mechanism) [J]. On the other hand, several theoretical simulations performed in 
simplified models [T^] and in all-atom models with implicit [13] and explicit [TOl E] solvent seem to favor a different 
folding mechanism, in which after the initial (3 turn, the hydrophobic residues collapse into a hydrophobic core, which 
is then followed by stabilizing bond formations. 

The two pictures lead to predictions which can be tested by measuring of the folding rates of mutated variants of the 
hairpin. In the zipper mechanism, a mutation which increases the stability of the hydrophobic cluster is not expected 
to significantly alter the folding rate, since the contacts in the hydrophobic core are formed after the transition state. 
In the hydrophobic cluster mechanism, the folding rate should not depend on the type of residues in the turn region. 
In addition, assuming that cluster is partially formed at the transition state, this picture predicts that the folding 
rate should be larger for sequences containing stabler hydrophobic clusters. 

Experiments seem to be overall more consistent with the zipper mechanism. For example, measurements performed 
by Gai and co-workers |T5] have shown that the hairpin called trpzip4 (sequence GEWTWDDATKTWTWTE) , folds 
at half of the rate of GB1 although has the same turn sequence of GB1, and is 15-fold more stable. This result is 
inconsistent with the hydrophobic collapse mechanism. 

There are several possible reasons for the observed discrepancy between theory and experiment. On the one 
hand, the available force fields may not accurate enough to describe the long-time dynamics of a polypeptide chain. 
On the other hand, the algorithm used to generate folding trajectories from a given force field may be spoiled by 
wrong approximations or by the choice of some "bad" reaction coordinates. Finally, it is possible that the folding 
trajectories used to infer the folding mechanism do not form a statistically significant sample. The DRP method 
provides in principle a way to test directly the validity of a given force field. In fact, it yields by construction the most 
statically significant trajectories, without involving any a priori choice of reaction coordinate, and without relying on 
any ad-hoc assumption, other than the underlying Langevin Eq. 

Let us begin our discussion by inferring what mechanism is suggested by our MD simulations. The lowest panel of 
Fig{7] shows that the configurations which are visited when the MD trajectories leave (or return to ) the native state 
are characterized by an almost constant size of the hydrophobic cluster (between 0.2nm and 0.3nm, while the fraction 
of native contacts drops significantly (from 1 to 0.6). This feature is clearly suggestive of a hydrophobic collapse 
mechanism. 

The same conclusion could have been reached directly from the DRP calculation, without having to generate the 



free energy landscape by MD. In Fig 10 we compare the average evolution of the gyration radius of the residues in the 
hydro-phobic cluster Rh with the fraction of native contacts N c , as a function of the fraction of the total number of 
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FIG. 11: Sequence of the hairpin configurations along the dominant reaction pathways, 
folding is initiated by the formation of the contact between the hydrophobic residues. 



This sequence illustrates how the 



path steps 5 . Again, the average was performed over just 8 trajectories. We observe that, during the first 20% of the 
unfolding transition, the gyration radius of the hydrophobic core remains statistically consistent with its value in the 
native conformation. On the other hand, in the same part of the transition, the fraction of native constants drops by 
about 40%, in quantitative agreement with MD results. An example of hydrophobic collapse observed in a dominant 



transition is provided by the sequence of conformations presented in Fig. 11 



VIII. CONCLUSIONS 



In this paper, we have assessed the reliability of the DRP approach, when it is applied to study protein folding 
reactions. As a test problem, we studied the folding of 16-residue /3-hairpin, using a Go-type reduced model. 

First, we have provided evidence that the saddle-point approximation — which underlies the DRP approach — is 
working well, at room temperature. In fact, we have verified that the thermal fluctuations around the different 
dominant reaction pathways are small and do not destroy the picture provided by the lowest-order analysis. In 
addition, we have shown that fluctuations around different saddle-point paths do not significantly overlap. 

Next, we have addressed the question whether it is possible to characterize the folding reaction by averaging over 
a relatively small number of denatured configurations. To this end, we have confronted the predictions obtained with 
the DRP method with the results obtained by long MD simulations. We have found that the two methods lead to 
consistent pictures of the transition. By averaging over the dominant reaction pathways obtained from just 8 different 
initial conditions, we could accurately predict the location of the valley in the free energy landscape, leading to the 
native state. 

In the last part of this work, we have investigated the folding mechanism of a /3-hairpin suggested by the simple 
Go-type model under consideration. We have found that both the MD and DRP results support a hydrophobic 
collapse mechanism. In fact, in the initial (final) stage of the unfolding (folding) reaction, the size of the hydrophobic 
core remains constant and consistent with the value in the native state, while the number of native contacts rapidly 
drops (increases). 

Based on this work, we conclude that the folding reaction of a chain of amino-acids with native interactions and 
steric repulsion can be characterized in terms of just few dominant reaction pathways. This result supports and 
complements the outcome of a previous validation study of the DRP method, which was performed on a smaller 
molecule using an atomistic force field [2]. 



5 It should be stressed that the equations defining the dominant reaction pathways are invariant under time-reversal, therefore the folding 
and unfolding transitions are completely equivalent, in this formalism. 
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